English
Related papers

Related papers: GPU Offloading in ExaHyPE Through C++ Standard Alg…

200 papers

HPC systems employ a growing variety of compute accelerators with different architectures and from different vendors. Large scientific applications are required to run efficiently across these systems but need to retain a single code-base…

On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-14 Thomas Heller , Hartmut Kaiser , Patrick Diehl , Dietmar Fey , Marc Alexander Schweitzer

The modern trend in High-Performance Computing (HPC) involves the use of accelerators such as Graphics Processing Units (GPUs) alongside Central Processing Units (CPUs) to speed up numerical operations in various applications. Leading…

Mathematical Software · Computer Science 2025-07-25 Giulio Malenza , Giovanni Stabile , Filippo Spiga , Robert Birke , Marco Aldinucci

ExaHyPE ("An Exascale Hyperbolic PDE Engine") is a software engine for solving systems of first-order hyperbolic partial differential equations (PDEs). Hyperbolic PDEs are typically derived from the conservation laws of physics and are…

In this paper, we introduce a software-defined framework that enables the parallel utilization of all the programmable processing resources available in heterogeneous system-on-chip (SoC) including FPGA-based hardware accelerators and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-12 Jose Nunez-Yanez , Mohammad Hosseinabady , Moslem Amiri , Andrés Rodríguez , Rafael Asenjo , Angeles Navarro , Rubén Gran-Tejero , Darío Suárez-Gracia

Developing parallel algorithms efficiently requires careful management of concurrency across diverse hardware architectures. C++ executors provide a standardized interface that simplifies the development process, allowing developers to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-22 Karame Mohammadiporshokooh , Steven R. Brandt , Hartmut Kaiser

High-fidelity simulations of unsteady fluid flow are now possible with advancements in high-performance computing hardware and software frameworks. Since computational fluid dynamics (CFD) computations are dominated by linear algebraic…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-28 Rahul Sundar , Dipanjan Majumdar , Chhote Lal Shah , Sunetra Sarkar

Heterogeneous systems are becoming more common on High Performance Computing (HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task to obtain optimal performance on the GPU. Approaches to simplifying this task include…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-01-11 Marek Blazewicz , Steven R. Brandt , Peter Diener , David M. Koppelman , Krzysztof Kurowski , Frank Löffler , Erik Schnetter , Jian Tao

We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-13 Muhammad Osama , Serban D. Porumbescu , John D. Owens

The simplex algorithm has been successfully used for many years in solving linear programming (LP) problems. Due to the intensive computations required (especially for the solution of large LP problems), parallel approaches have also…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-22 Basilis Mamalis , Marios Perlitis

Software developers must adapt to keep up with the changing capabilities of platforms so that they can utilize the power of High- Performance Computers (HPC), including exascale systems. OpenMP, a directive-based parallel programming model,…

In this paper, we introduce Heteroflow, a new C++ library to help developers quickly write parallel CPU-GPU programs using task dependency graphs. Heteroflow leverages the power of modern C++ and task-based approaches to enable efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-17 Tsung-Wei Huang , Yibo Lin

Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-16 Yehonatan Fridman , Guy Tamir , Gal Oren

This documentation is designed for beginners in Graphics Processing Unit (GPU)-programming and who want to get familiar with OpenACC and OpenMP offloading models. Here we present an overview of these two programming models as well as of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-31 Hichan Agueny

The electrical and electronic engineering has used parallel programming to solve its large scale complex problems for performance reasons. However, as parallel programming requires a non-trivial distribution of tasks and data, developers…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-07-05 Antonio Wendell De Oliveira Rodrigues , Frédéric Guyomarc'H , Jean-Luc Dekeyser , Yvonnick Le Menach

With the increasing diversity of heterogeneous architecture in the HPC industry, porting a legacy application to run on different architectures is a tough challenge. In this paper, we present OpenMP Advisor, a first of its kind compiler…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-11 Alok Mishra , Abid M. Malik , Meifeng Lin , Barbara Chapman

OpenCAEPoro is a parallel numerical simulation software developed in C++ for simulating multiphase and multicomponent flows in porous media. The software utilizes a set of general-purpose compositional model equations, enabling it to handle…

Mathematical Software · Computer Science 2024-06-18 Shizhe Li , Chen-Song Zhang

Last several years, GPUs are used to accelerate computations in many computer science domains. We focused on GPU accelerated Support Vector Machines (SVM) training with non-linear kernel functions. We had searched for all available GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-07-21 Jan Vanek , Josef Michalek , Josef Psutka

This study systematically tests a computational power reuse scheme proposed by the open source community disabling specific instruction sets (Fused Multiply Add instructions) through CUDA source code modifications on the NVIDIA CMP 170HX…

Hardware Architecture · Computer Science 2025-05-09 Xing Kangwei

Experience shows that on today's high performance systems the utilization of different acceleration cards in conjunction with a high utilization of all other parts of the system is difficult. Future architectures, like exascale clusters,…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-07 Patrick Diehl , Madhavan Seshadri , Thomas Heller , Hartmut Kaiser
‹ Prev 1 2 3 10 Next ›