English
Related papers

Related papers: Coloured and task-based stencil codes

200 papers

New algorithms and optimization techniques are needed to balance the accelerating trend towards bandwidth-starved multicore chips. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the…

Performance · Computer Science 2012-03-01 Markus Wittmann , Georg Hager , Gerhard Wellein

Bandwidth-starved multicore chips have become ubiquitous. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-06-17 Markus Wittmann , Georg Hager , Jan Treibig , Gerhard Wellein

Stencil computations are a key class of applications, widely used in the scientific computing community, and a class that has particularly benefited from performance improvements on architectures with high memory bandwidth. Unfortunately,…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-27 Istvan Z Reguly , Gihan R Mudalige , Michael B Giles

Currently, multi/many-core CPUs are considered standard in most types of computers including, mobile phones, PCs or supercomputers. However, the parallelization of applications as well as refactoring/design of applications for efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-25 Garip Kusoglu , Berenger Bramas , Stephane Genaud

Programming a distributed system, such as a cluster, requires extended use of low-level communication libraries and can often become cumbersome and error prone for the average developer. In this work, we consider each node of a cluster as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-24 Ilias Keftakis , Vassilios V. Dimakopoulos

Graph coloring is often used in parallelizing scientific computations that run in distributed and multi-GPU environments; it identifies sets of independent data that can be updated in parallel. Many algorithms exist for graph coloring on a…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-02 Ian Bogle , Erik G Boman , Karen D Devine , Sivasankaran Rajamanickam , George M Slota

Multicore has emerged as a typical architecture model since its advent and stands now as a standard. The trend is to increase the number of cores and improve the performance of the memory system. Providing an efficient multicore…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Claude Tadonki

FPGA-based hardware accelerators have received increasing attention mainly due to their ability to accelerate deep pipelined applications, thus resulting in higher computational performance and energy efficiency. Nevertheless, the amount of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-23 R. Nepomuceno , R. Sterle , G. Valarini , M. Pereira , H. Yviquel , G. Araujo

Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-14 Afshin Zafari , Elisabeth Larsson , Martin Tillenius

Identifying the sets of operations that can be executed simultaneously is an important problem appearing in many parallel applications. By modeling the operations and their interactions as a graph, one can identify the independent…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-07-28 Ahmet Erdem Sarıyüce , Erik Saule , Ümit V. Çatalyürek

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-19 Kun Li , Liang Yuan , Yunquan Zhang , Yue Yue , Hang Cao , Pengqi Lu

Good process-to-compute-node mappings can be decisive for well performing HPC applications. A special, important class of process-to-node mapping problems is the problem of mapping processes that communicate in a sparse stencil pattern to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-21 Sascha Hunold , Konrad von Kirchbach , Markus Lehr , Christian Schulz , Jesper Larsson Träff

Stencil computations consume a major part of runtime in many scientific simulation codes. As prototypes for this class of algorithms we consider the iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient parallel…

Performance · Computer Science 2012-03-01 Jan Treibig , Gerhard Wellein , Georg Hager

In this paper, we present multi-threaded algorithms for graph coloring suitable to the shared memory programming model. We modify an existing algorithm widely used in the literature and prove the correctness of the modified algorithm. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-11 Nandini Singhal , Sathya Peri , Subrahmanyam Kalyanasundaram

Task parallelism as employed by the OpenMP task construct or some Intel Threading Building Blocks (TBB) components, although ideal for tackling irregular problems or typical producer/consumer schemes, bears some potential for performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-01-04 Markus Wittmann , Georg Hager

The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-28 Tareq Malas , Georg Hager , Hatem Ltaief , Holger Stengel , Gerhard Wellein , David Keyes

In parallel computing, a valid graph coloring yields a lock-free processing of the colored tasks, data points, etc., without expensive synchronization mechanisms. However, coloring is not free and the overhead can be significant. In…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-11 Mustafa Kemal Taş , Kamer Kaya , Erik Saule

Task parallelism as employed by the OpenMP task construct, although ideal for tackling irregular problems or typical producer/consumer schemes, bears some potential for performance bottlenecks if locality of data access is important, which…

Performance · Computer Science 2009-02-12 Markus Wittmann , Georg Hager

Parallel task-based programming models, like OpenMP, allow application developers to easily create a parallel version of their sequential codes. The standard OpenMP 4.0 introduced the possibility of describing a set of data dependences per…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-09 Jaume Bosch , Carlos Álvarez , Daniel Jiménez-González , Xavier Martorell , Eduard Ayguadé

Block iterative methods are extremely important as smoothers for multigrid methods, as preconditioners for Krylov methods, and as solvers for diagonally dominant linear systems. Developing robust and efficient algorithms suitable for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-16 Manuel Birke , Bobby Philip , Zhen Wang , Mark Berrill
‹ Prev 1 2 3 10 Next ›