English
Related papers

Related papers: StencilFlow: Mapping Large Stencil Programs to Dis…

200 papers

Recent developments in High Level Synthesis tools have attracted software programmers to accelerate their high-performance computing applications on FPGAs. Even though it has been shown that FPGAs can compete with GPUs in terms of…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-16 Hamid Reza Zohouri , Artur Podobas , Satoshi Matsuoka

Stencils represent a class of computational patterns where an output grid point depends on a fixed shape of neighboring points in an input grid. Stencil computations are prevalent in scientific applications engaging a significant portion of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-24 Jesmin Jahan Tithi , Fabrizio Petrini , Hongbo Rong , Andrei Valentin , Carl Ebeling

Finite-difference methods based on high-order stencils are widely used in seismic simulations, weather forecasting, computational fluid dynamics, and other scientific applications. Achieving HPC-level stencil computations on one…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-09 Ryuichi Sai , John Mellor-Crummey , Jinfan Xu , Mauricio Araya-Polo

Stencil computation is one of the most important kernels in various scientific computing. Nowadays, most Stencil-driven scientific computing still relies heavily on supercomputers, suffering from expensive access, poor scalability, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-16 Kun Li , Zhichun Li , Yuetao Chen , Zixuan Wang , Yiwei Zhang , Liang Yuan , Haipeng Jia , Yunquan Zhang , Ting Cao , Mao Yang

Iterative stencils are used widely across the spectrum of High Performance Computing (HPC) applications. Many efforts have been put into optimizing stencil GPU kernels, given the prevalence of GPU-accelerated supercomputers. To improve the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-15 Lingqi Zhang , Mohamed Wahib , Peng Chen , Jintao Meng , Xiao Wang , Toshio Endo , Satoshi Matsuoka

Stencil computation is one of the fundamental computing patterns in many application domains such as scientific computing and image processing. While there are promising studies that accelerate stencils on FPGAs, there lacks an automated…

Hardware Architecture · Computer Science 2022-08-24 Xingyu Tian , Zhifan Ye , Alec Lu , Licheng Guo , Yuze Chi , Zhenman Fang

It is well known that to accelerate stencil codes on CPUs or GPUs and to exploit hardware caches and their lines optimizers must find spatial and temporal locality of array accesses to harvest data-reuse opportunities. On FPGAs there is the…

Programming Languages · Computer Science 2024-01-25 Florian Mayer , Julian Brandner , Michael Philippsen

Dataflow devices represent an avenue towards saving the control and data movement overhead of Load-Store Architectures. Various dataflow accelerators have been proposed, but how to efficiently schedule applications on such devices remains…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-06 Tiziano De Matteis , Lukas Gianinazzi , Johannes de Fine Licht , Torsten Hoefler

Stencil computation is one of the most widely-used compute patterns in high performance computing applications. Spatial and temporal blocking have been proposed to overcome the memory-bound nature of this type of computation by moving…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-04 Kazuaki Matsumura , Hamid Reza Zohouri , Mohamed Wahib , Toshio Endo , Satoshi Matsuoka

Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-26 George Bisbas , Fabio Luporini , Mathias Louboutin , Rhodri Nelson , Gerard Gorman , Paul H. J. Kelly

We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce,…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-09-16 M. Aldinucci , M. Danelutto , M. Drocco , P. Kilpatrick , C. Misale , G. Peretti Pezzi , M. Torquati

Emerging hybrid accelerator architectures for high performance computing are often suited for the use of a data-parallel programming model. Unfortunately, programmers of these architectures face a steep learning curve that frequently…

Programming Languages · Computer Science 2015-02-13 Craig Rasmussen , Matthew Sottile , Daniel Nagle , Soren Rasmussen

The challenges associated with effectively programming FPGAs have been a major blocker in popularising reconfigurable architectures for HPC workloads. However new compiler technologies, such as MLIR, are providing new capabilities which…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-04 Gabriel Rodriguez-Canal , Nick Brown , Maurice Jamieson , Emilien Bauer , Anton Lydike , Tobias Grosser

Good process-to-compute-node mappings can be decisive for well performing HPC applications. A special, important class of process-to-node mapping problems is the problem of mapping processes that communicate in a sparse stencil pattern to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-21 Sascha Hunold , Konrad von Kirchbach , Markus Lehr , Christian Schulz , Jesper Larsson Träff

Stencil computations are widely used in HPC applications. Today, many HPC platforms use GPUs as accelerators. As a result, understanding how to perform stencil computations fast on GPUs is important. While implementation strategies for…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-16 Ryuichi Sai , John Mellor-Crummey , Xiaozhu Meng , Mauricio Araya-Polo , Jie Meng

In this paper we evaluate the performance of FPGAs for high-order stencil computation using High-Level Synthesis. We show that despite the higher computation intensity and on-chip memory requirement of such stencils compared to first-order…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-17 Hamid Reza Zohouri , Artur Podobas , Satoshi Matsuoka

Stencil computations consume a major part of runtime in many scientific simulation codes. As prototypes for this class of algorithms we consider the iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient parallel…

Performance · Computer Science 2012-03-01 Jan Treibig , Gerhard Wellein , Georg Hager

The design of many-core neuromorphic hardware is getting more and more complex as these systems are expected to execute large machine learning models. To deal with the design complexity, a predictable design flow is needed to guarantee…

Neural and Evolutionary Computing · Computer Science 2021-08-31 Shihao Song , M. Lakshmi Varshika , Anup Das , Nagarajan Kandasamy

Optimizing the performance of stencil algorithms has been the subject of intense research over the last two decades. Since many stencil schemes have low arithmetic intensity, most optimizations focus on increasing the temporal data access…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-19 Tareq Malas , Georg Hager , Hatem Ltaief , David Keyes

Fast and accurate climate simulations and weather predictions are critical for understanding and preparing for the impact of climate change. Real-world weather and climate modeling consist of complex compound stencil kernels that do not…

‹ Prev 1 2 3 10 Next ›