English
Related papers

Related papers: Fast Stencil-Code Computation on a Wafer-Scale Pro…

200 papers

Stencil computations are a fundamental kernel in scientific computing, critical for simulations in domains such as fluid dynamics and climate modeling. However, these computations are often memory-bound on traditional High-Performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-11 Elia Belli , Daniele De Sensi

The Cerebras Wafer-Scale Engine (WSE) delivers performance at an unprecedented scale of over 900,000 compute units, all connected via a single-wafer on-chip interconnect. Initially designed for AI, the WSE architecture is also well-suited…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-27 Nicolai Stawinoga , David Katz , Anton Lydike , Justs Zarins , Nick Brown , George Bisbas , Tobias Grosser

The Cerebras Wafer Scale Engine (WSE) is an accelerator that combines hundreds of thousands of AI-cores onto a single chip. Whilst this technology has been designed for machine learning workloads, the significant amount of available raw…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-11 Nick Brown , Brandon Echols , Justs Zarins , Tobias Grosser

Stencil computations are a key class of applications, widely used in the scientific computing community, and a class that has particularly benefited from performance improvements on architectures with high memory bandwidth. Unfortunately,…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-27 Istvan Z Reguly , Gihan R Mudalige , Michael B Giles

Stencil computations lie at the heart of many scientific and industrial applications. Unfortunately, stencil algorithms perform poorly on machines with cache based memory hierarchy, due to low re-use of memory accesses. This work shows that…

Mathematical Software · Computer Science 2022-04-11 Mathias Jacquelin , Mauricio Araya-Polo , Jie Meng

In this paper we evaluate the performance of FPGAs for high-order stencil computation using High-Level Synthesis. We show that despite the higher computation intensity and on-chip memory requirement of such stencils compared to first-order…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-17 Hamid Reza Zohouri , Artur Podobas , Satoshi Matsuoka

We have implemented fast Fourier transforms for one, two, and three-dimensional arrays on the Cerebras CS-2, a system whose memory and processing elements reside on a single silicon wafer. The wafer-scale engine (WSE) encompasses a…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-26 Marcelo Orenes-Vera , Ilya Sharapov , Robert Schreiber , Mathias Jacquelin , Philippe Vandermersch , Sharan Chetlur

Stencil computation is one of the most used kernels in a wide variety of scientific applications, ranging from large-scale weather prediction to solving partial differential equations. Stencil computations are characterized by three unique…

Hardware Architecture · Computer Science 2023-09-07 Alain Denzler , Rahul Bera , Nastaran Hajinazar , Gagandeep Singh , Geraldo F. Oliveira , Juan Gómez-Luna , Onur Mutlu

Stencil computations are widely used in HPC applications. Today, many HPC platforms use GPUs as accelerators. As a result, understanding how to perform stencil computations fast on GPUs is important. While implementation strategies for…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-16 Ryuichi Sai , John Mellor-Crummey , Xiaozhu Meng , Mauricio Araya-Polo , Jie Meng

Finite-difference methods based on high-order stencils are widely used in seismic simulations, weather forecasting, computational fluid dynamics, and other scientific applications. Achieving HPC-level stencil computations on one…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-09 Ryuichi Sai , John Mellor-Crummey , Jinfan Xu , Mauricio Araya-Polo

Molecular dynamics (MD) simulations have transformed our understanding of the nanoscale, driving breakthroughs in materials science, computational chemistry, and several other fields, including biophysics and drug design. Even on exascale…

In this work we evaluate the potential of FPGAs for accelerating HPC workloads as a more power-efficient alternative to GPUs. Using High-Level Synthesis and a large set of optimization techniques, we show that FPGAs can achieve better…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-17 Hamid Reza Zohouri

In this era of diverse and heterogeneous computer architectures, the programmability issues, such as productivity and portable efficiency, are crucial to software development and algorithm design. One way to approach the problem is to step…

Mathematical Software · Computer Science 2012-07-10 Mauro Bianco , Ugo Varetto

Stencil computation is an important class of scientific applications that can be efficiently executed by graphics processing units (GPUs). Out-of-core approach helps run large scale stencil codes that process data with sizes larger than the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-14 Jingcheng Shen , Yifan Wu , Masao Okita , Fumihiko Ino

Modern compute nodes in high-performance computing provide a tremendous level of parallelism and processing power. However, as arithmetic performance has been observed to increase at a faster rate relative to memory and network bandwidths,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-11 Johannes Pekkilä , Miikka S. Väisälä , Maarit J. Käpylä , Matthias Rheinhardt , Oskar Lappi

An out-of-core stencil computation code handles large data whose size is beyond the capacity of GPU memory. Whereas, such an code requires streaming data to and from the GPU frequently. As a result, data movement between the CPU and GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-26 Jingcheng Shen , Xin Deng , Yifan Wu , Masao Okita , Fumihiko Ino

Block iterative methods are extremely important as smoothers for multigrid methods, as preconditioners for Krylov methods, and as solvers for diagonally dominant linear systems. Developing robust and efficient algorithms suitable for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-16 Manuel Birke , Bobby Philip , Zhen Wang , Mark Berrill

This paper presents a workflow for synthesizing near-optimal FPGA implementations for structured-mesh based stencil applications for explicit solvers. It leverages key characteristics of the application class, its computation-communication…

Hardware Architecture · Computer Science 2021-01-08 Kamalavasan Kamalakkannan , Gihan R. Mudalige , Istvan Z. Reguly , Suhaib A. Fahmy

Over the last ten years, graphics processors have become the de facto accelerator for data-parallel tasks in various branches of high-performance computing, including machine learning and computational sciences. However, with the recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-28 Johannes Pekkilä , Oskar Lappi , Fredrik Robertsén , Maarit J. Korpi-Lagg

Cerebras' wafer-scale engine (WSE) technology merges multiple dies on a single wafer. It addresses the challenges of memory bandwidth, latency, and scalability, making it suitable for artificial intelligence. This work evaluates the WSE-3…

Hardware Architecture · Computer Science 2025-03-18 Yudhishthira Kundu , Manroop Kaur , Tripty Wig , Kriti Kumar , Pushpanjali Kumari , Vivek Puri , Manish Arora
‹ Prev 1 2 3 10 Next ›