Related papers: Bind: a Partitioned Global Workflow Parallel Progr…
Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…
Parallelization is needed everywhere, from laptops and mobile phones to supercomputers. Among parallel programming models, task-based programming has demonstrated a powerful potential and is widely used in high-performance scientific…
Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient…
We propose Chunks and Tasks, a parallel programming model built on abstractions for both data and work. The application programmer specifies how data and work can be split into smaller pieces, chunks and tasks, respectively. The Chunks and…
Multicore parallel programming has some very difficult problems such as deadlocks during synchronizations and race conditions brought by concurrency. Added to the difficulty is the lack of a simple, well-accepted computing model for…
In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within…
Programming modern high-performance computing systems is challenging due to the need to efficiently program GPUs and accelerators and to handle data movement between nodes. The C++ language has been continuously enhanced in recent years…
Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential…
Optimizing communication performance is imperative for large-scale computing because communication overheads limit the strong scalability of parallel applications. Today's network cards contain rather powerful processors optimized for data…
Writing high performance solvers for engineering applications is a delicate task. These codes are often developed on an application to application basis, highly optimized to solve a certain problem. Here, we present our work on developing a…
In this era of diverse and heterogeneous computer architectures, the programmability issues, such as productivity and portable efficiency, are crucial to software development and algorithm design. One way to approach the problem is to step…
This paper consists of three parts. The first part provides a unified programming model for heterogeneous computing with CPU and accelerator (like GPU, FPGA, Google TPU, Atos QPU, and more) technologies. To some extent, this new programming…
The advent of high performance computing (HPC) and graphics processing units (GPU), present an enormous computation resource for Large data transactions (big data) that require parallel processing for robust and prompt data analysis. While…
Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored…
Parallel and distributed application design is a major area of interest in the domain of high performance scientific and industrial computing. Over the years, various approaches have been proposed to aid parallel program developers to…
Parallel processing, the core of High Performance Computing (HPC), was and still the most effective way in improving the speed of computer systems. For the past few years, the substantial developments in the computing power of processors…
Developing software to effectively take advantage of growth in parallel and distributed processing capacity poses significant challenges. Traditional programming techniques allow a user to assume that execution, message passing, and memory…
Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core…
The convergence of IoT, Edge, Cloud, and HPC technologies creates a compute continuum that merges cloud scalability and flexibility with HPC's computational power and specialized optimizations. However, integrating cloud and HPC resources…
This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several…