Related papers: Bind: a Partitioned Global Workflow Parallel Progr…

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

Specx: a C++ task-based runtime system for heterogeneous distributed architectures

Parallelization is needed everywhere, from laptops and mobile phones to supercomputers. Among parallel programming models, task-based programming has demonstrated a powerful potential and is widely used in high-performance scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-18 Paul Cardosi , Bérenger Bramas

Pipeflow: An Efficient Task-Parallel Pipeline Programming Framework using Modern C++

Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-03 Cheng-Hsiang Chiu , Tsung-Wei Huang , Zizheng Guo , Yibo Lin

Chunks and Tasks: a programming model for parallelization of dynamic algorithms

We propose Chunks and Tasks, a parallel programming model built on abstractions for both data and work. The application programmer specifies how data and work can be split into smaller pieces, chunks and tasks, respectively. The Chunks and…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-07-29 Emanuel H. Rubensson , Elias Rudberg

A Generalized Streaming Model for Concurrent Computing

Multicore parallel programming has some very difficult problems such as deadlocks during synchronizations and race conditions brought by concurrency. Added to the difficulty is the lack of a simple, well-accepted computing model for…

Programming Languages · Computer Science 2010-12-09 Yibing Wang

A Graph-Partition-Based Scheduling Policy for Heterogeneous Architectures

In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-02-27 Hao Wu , Daniel Lohmann , Wolfgang Schröder-Preikschat

Toward Heterogeneous, Distributed, and Energy-Efficient Computing with SYCL

Programming modern high-performance computing systems is challenging due to the need to efficiently program GPUs and accelerators and to handle data movement between nodes. The C++ language has been continuously enhanced in recent years…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-12 Biagio Cosenza , Lorenzo Carpentieri , Kaijie Fan , Marco D'Antonio , Peter Thoman , Philip Salzmann

Parallel Programming Models for Heterogeneous Many-Cores : A Survey

Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-11 Jianbin Fang , Chun Huang , Tao Tang , Zheng Wang

sPIN: High-performance streaming Processing in the Network

Optimizing communication performance is imperative for large-scale computing because communication overheads limit the strong scalability of parallel applications. Today's network cards contain rather powerful processors optimized for data…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-20 Torsten Hoefler , Salvatore Di Girolamo , Konstantin Taranov , Ryan E. Grant , Ron Brightwell

CUBE: A scalable framework for large-scale industrial simulations

Writing high performance solvers for engineering applications is a delicate task. These codes are often developed on an application to application basis, highly optimized to solve a certain problem. Here, we present our work on developing a…

Computational Engineering, Finance, and Science · Computer Science 2018-08-14 Niclas Jansson , Rahul Bale , Keiji Onishi , Makoto Tsubokura

A Generic Library for Stencil Computations

In this era of diverse and heterogeneous computer architectures, the programmability issues, such as productivity and portable efficiency, are crucial to software development and algorithm design. One way to approach the problem is to step…

Mathematical Software · Computer Science 2012-07-10 Mauro Bianco , Ugo Varetto

A Unified Programming Model for Heterogeneous Computing with CPU and Accelerator Technologies

This paper consists of three parts. The first part provides a unified programming model for heterogeneous computing with CPU and accelerator (like GPU, FPGA, Google TPU, Atos QPU, and more) technologies. To some extent, this new programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-31 Yuqing Xiong

A Review of CUDA, MapReduce, and Pthreads Parallel Computing Models

The advent of high performance computing (HPC) and graphics processing units (GPU), present an enormous computation resource for Large data transactions (big data) that require parallel processing for robust and prompt data analysis. While…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-17 Kato Mivule , Benjamin Harvey , Crystal Cobb , Hoda El Sayed

DuctTeip: An efficient programming model for distributed task based parallel computing

Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-14 Afshin Zafari , Elisabeth Larsson , Martin Tillenius

A UML-based Approach to Design Parallel and Distributed Applications

Parallel and distributed application design is a major area of interest in the domain of high performance scientific and industrial computing. Over the years, various approaches have been proposed to aid parallel program developers to…

Software Engineering · Computer Science 2013-11-28 Yasset Perez-Riverol , Roberto Vera Alvarez

Scheduling on Grid with communication Delay

Parallel processing, the core of High Performance Computing (HPC), was and still the most effective way in improving the speed of computer systems. For the past few years, the substantial developments in the computing power of processors…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-12-15 Samouriq Difrawi

Conduit: A C++ Library for Best-effort High Performance Computing

Developing software to effectively take advantage of growth in parallel and distributed processing capacity poses significant challenges. Traditional programming techniques allow a user to assume that execution, message passing, and memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-24 Matthew Andres Moreno , Santiago Rodriguez Papa , Charles Ofria

Improving the performance of the linear systems solvers using CUDA

Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

Workflow-Driven Modeling for the Compute Continuum: An Optimization Approach to Automated System and Workload Scheduling

The convergence of IoT, Edge, Cloud, and HPC technologies creates a compute continuum that merges cloud scalability and flexibility with HPC's computational power and specialized optimizations. However, integrating cloud and HPC resources…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-20 Aasish Kumar Sharma , Christian Boehme , Patrick Gelß , Ramin Yahyapour , Julian Kunkel

Co-Scheduling Algorithms for High-Throughput Workload Execution

This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several…

Data Structures and Algorithms · Computer Science 2013-05-01 Guillaume Aupy , Manu Shantharam , Anne Benoit , Yves Robert , Padma Raghavan