Related papers: Lifting C Semantics for Dataflow Optimization

Closing the Performance Gap with Modern C++

On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-14 Thomas Heller , Hartmut Kaiser , Patrick Diehl , Dietmar Fey , Marc Alexander Schweitzer

Application specific dataflow machine construction for programming FPGAs via Lucent

Field Programmable Gate Arrays (FPGAs) have the potential to accelerate specific HPC codes. However even with the advent of High Level Synthesis (HLS), which enables FPGA programmers to write code in C or C++, programming such devices still…

Programming Languages · Computer Science 2021-04-13 Nick Brown

Fork is All You Need in Heterogeneous Systems

We present a unified programming model for heterogeneous computing systems. Such systems integrate multiple computing accelerators and memory units to deliver higher performance than CPU-centric systems. Although heterogeneous systems have…

Emerging Technologies · Computer Science 2024-04-18 Zixuan Wang , Jishen Zhao

StreamBlocks: A compiler for heterogeneous dataflow computing (technical report)

To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be…

Hardware Architecture · Computer Science 2021-07-21 Endri Bezati , Mahyar Emami , Jörn Janneck , James Larus

Shared memory parallelism in Modern C++ and HPX

Parallel programming remains a daunting challenge, from the struggle to express a parallel algorithm without cluttering the underlying synchronous logic, to describing which devices to employ in a calculation, to correctness. Over the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-10 Patrick Diehl , Steven R. Brandt , Hartmut Kaiser

cf4ocl: a C framework for OpenCL

OpenCL is an open standard for parallel programming of heterogeneous compute devices, such as GPUs, CPUs, DSPs or FPGAs. However, the verbosity of its C host API can hinder application development. In this paper we present cf4ocl, a…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-16 Nuno Fachada , Vitor V. Lopes , Rui C. Martins , Agostinho C. Rosa

HPC-Coder: Modeling Parallel Programs using Large Language Models

Parallel programs in high performance computing (HPC) continue to grow in complexity and scale in the exascale era. The diversity in hardware and parallel programming models make developing, optimizing, and maintaining parallel software…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Daniel Nichols , Aniruddha Marathe , Harshitha Menon , Todd Gamblin , Abhinav Bhatele

Python FPGA Programming with Data-Centric Multi-Level Design

Although high-level synthesis (HLS) tools have significantly improved programmer productivity over hardware description languages, developing for FPGAs remains tedious and error prone. Programmers must learn and implement a large set of…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-29 Johannes de Fine Licht , Tiziano De Matteis , Tal Ben-Nun , Andreas Kuster , Oliver Rausch , Manuel Burger , Carl-Johannes Johnsen , Torsten Hoefler

C++ programming language for an abstract massively parallel SIMD architecture

The aim of this work is to define and implement an extended C++ language to support the SIMD programming paradigm. The C++ programming language has been extended to express all the potentiality of an abstract SIMD machine consisting of a…

Programming Languages · Computer Science 2007-05-23 Alessandro Lonardo , Emanuele Panizzi , Benedetto Proietti

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate…

Programming Languages · Computer Science 2020-11-02 Michail Papadimitriou , Juan Fumero , Athanasios Stratikopoulos , Foivos S. Zakkak , Christos Kotselidis

Concurrent CPU-GPU Task Programming using Modern C++

In this paper, we introduce Heteroflow, a new C++ library to help developers quickly write parallel CPU-GPU programs using task dependency graphs. Heteroflow leverages the power of modern C++ and task-based approaches to enable efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-17 Tsung-Wei Huang , Yibo Lin

Bridging Control-Centric and Data-Centric Optimization

With the rise of specialized hardware and new programming languages, code optimization has shifted its focus towards promoting data locality. Most production-grade compilers adopt a control-centric mindset - instruction-driven optimization…

Programming Languages · Computer Science 2023-06-02 Tal Ben-Nun , Berke Ates , Alexandru Calotoiu , Torsten Hoefler

ParaGraph: Weighted Graph Representation for Performance Optimization of HPC Kernels

GPU-based HPC clusters are attracting more scientific application developers due to their extensive parallelism and energy efficiency. In order to achieve portability among a variety of multi/many core architectures, a popular choice for an…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-10 Ali TehraniJamsaz , Alok Mishra , Akash Dutta , Abid M. Malik , Barbara Chapman , Ali Jannesari

Deterministic Consistency: A Programming Model for Shared Memory Parallelism

The difficulty of developing reliable parallel software is generating interest in deterministic environments, where a given program and input can yield only one possible result. Languages or type systems can enforce determinism in new code,…

Operating Systems · Computer Science 2010-02-01 Amittai Aviram , Bryan Ford

Continuation-Passing C: compiling threads to events through continuations

In this paper, we introduce Continuation Passing C (CPC), a programming language for concurrent systems in which native and cooperative threads are unified and presented to the programmer as a single abstraction. The CPC compiler uses a…

Programming Languages · Computer Science 2012-11-15 Gabriel Kerneis , Juliusz Chroboczek

Parallel Prefix Polymorphism Permits Parallelization, Presentation & Proof

Polymorphism in programming languages enables code reuse. Here, we show that polymorphism has broad applicability far beyond computations for technical computing: parallelism in distributed computing, presentation of visualizations of…

Programming Languages · Computer Science 2014-11-07 Jiahao Chen , Alan Edelman

Generating Configurable Hardware from Parallel Patterns

In recent years the computing landscape has seen an in- creasing shift towards specialized accelerators. Field pro- grammable gate arrays (FPGAs) are particularly promising as they offer significant performance and energy improvements…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Raghu Prabhakar , David Koeplinger , Kevin Brown , HyoukJoong Lee , Christopher De Sa , Christos Kozyrakis , Kunle Olukotun

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

Datalog-based Scalable Semantic Diffing of Concurrent Programs

When an evolving program is modified to address issues related to thread synchronization, there is a need to confirm the change is correct, i.e., it does not introduce unexpected behavior. However, manually comparing two programs to…

Software Engineering · Computer Science 2018-07-17 Chungha Sung , Shuvendu Lahiri , Constantin Enea , Chao Wang

Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels

Over recent years heterogeneous systems have become more prevalent across HPC systems, with over 100 supercomputers in the TOP500 incorporating GPUs or other accelerators. These hardware platforms have different performance characteristics…

Performance · Computer Science 2019-04-11 John Lawson , Mehdi Goli , Duncan McBain , Daniel Soutar , Louis Sugy