Related papers: Couillard: Parallel Programming via Coarse-Grained…

Labyrinth: Compiling Imperative Control Flow to Parallel Dataflows

Parallel dataflow systems have become a standard technology for large-scale data analytics. Complex data analysis programs in areas such as machine learning and graph analytics often involve control flow, i.e., iterations and branching.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-16 Gábor E. Gévay , Tilmann Rabl , Sebastian Breß , Loránd Madai-Tahy , Volker Markl

Uniting Control and Data Parallelism: Towards Scalable Memory-Driven Dynamic Graph Processing

Control parallelism and data parallelism is mostly reasoned and optimized as separate functions. Because of this, workloads that are irregular, fine-grain and dynamic such as dynamic graph processing become very hard to scale. An…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-08 Bibrak Qamar Chandio , Thomas Sterling , Prateek Srivastava

Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System

Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach. Taskflow introduces an expressive task graph programming model to assist developers in the implementation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-08 Tsung-Wei Huang , Dian-Lun Lin , Chun-Xun Lin , Yibo Lin

An LLM-Tool Compiler for Fused Parallel Function Calling

State-of-the-art sequential reasoning in Large Language Models (LLMs) has expanded the capabilities of Copilots beyond conversational tasks to complex function calling, managing thousands of API calls. However, the tendency of compositional…

Programming Languages · Computer Science 2024-05-29 Simranjit Singh , Andreas Karatzas , Michael Fore , Iraklis Anagnostopoulos , Dimitrios Stamoulis

A Datalog-based Computational Model for Coordination-free, Data-Parallel Systems

Cloud computing refers to maximizing efficiency by sharing computational and storage resources, while data-parallel systems exploit the resources available in the cloud to perform parallel transformations over large amounts of data. In the…

Databases · Computer Science 2018-07-10 Matteo Interlandi , Letizia Tanca

Pipeflow: An Efficient Task-Parallel Pipeline Programming Framework using Modern C++

Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-03 Cheng-Hsiang Chiu , Tsung-Wei Huang , Zizheng Guo , Yibo Lin

Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support

The deep neural networks (DNNs) have been enormously successful in tasks that were hitherto in the human-only realm such as image recognition, and language translation. Owing to their success the DNNs are being explored for use in ever more…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-20 Sanket Tavarageri , Srinivas Sridharan , Bharat Kaul

Unified Control and Data Flow Diagrams Applied to Software Engineering and other Systems

More often than not, there is a need to understand the structure of complex computer code: what functions and in what order they are called, how information travels around static, input, and output variables, what depends on what. As a…

Software Engineering · Computer Science 2016-10-10 Igor Polkovnikov

PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training

Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-07 Jiahao Fang , Huizheng Wang , Qize Yang , Dehao Kong , Xu Dai , Jinyi Deng , Yang Hu , Shouyi Yin

Maximum Flows in Parametric Graph Templates

Execution graphs of parallel loop programs exhibit a nested, repeating structure. We show how such graphs that are the result of nested repetition can be represented by succinct parametric structures. This parametric graph template…

Data Structures and Algorithms · Computer Science 2023-07-18 Tal Ben-Nun , Lukas Gianinazzi , Torsten Hoefler , Yishai Oltchik

Cimple: Instruction and Memory Level Parallelism

Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for…

Programming Languages · Computer Science 2018-07-05 Vladimir Kiriansky , Haoran Xu , Martin Rinard , Saman Amarasinghe

The Autonomous Data Language -- Concepts, Design and Formal Verification

Nowadays, the main advances in computational power are due to parallelism. However, most parallel languages have been designed with a focus on processors and threads. This makes dealing with data and memory in programs hard, which distances…

Programming Languages · Computer Science 2025-12-12 Tom T. P. Franken , Thomas Neele , Jan Friso Groote

A UML-based Approach to Design Parallel and Distributed Applications

Parallel and distributed application design is a major area of interest in the domain of high performance scientific and industrial computing. Over the years, various approaches have been proposed to aid parallel program developers to…

Software Engineering · Computer Science 2013-11-28 Yasset Perez-Riverol , Roberto Vera Alvarez

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the…

Programming Languages · Computer Science 2016-04-13 Alcides Fonseca , Bruno Cabral , João Rafael , Ivo Correia

StreamBlocks: A compiler for heterogeneous dataflow computing (technical report)

To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be…

Hardware Architecture · Computer Science 2021-07-21 Endri Bezati , Mahyar Emami , Jörn Janneck , James Larus

Almost Continuous Transformations of Software and Higher-order Dataflow Programming

We consider two classes of stream-based computations which admit taking linear combinations of execution runs: probabilistic sampling and generalized animation. The dataflow architecture is a natural platform for programming with streams.…

Programming Languages · Computer Science 2016-01-06 Michael Bukatin , Steve Matthews

Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations

Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a…

Programming Languages · Computer Science 2021-07-02 Chaitanya Koparkar , Mike Rainey , Michael Vollmer , Milind Kulkarni , Ryan R. Newton

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems

Parallel programming models can encourage performance portability by moving the responsibility for work assignment and data distribution from the programmer to a runtime system. However, analyzing the resulting implicit memory allocations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-14 Fabian Knorr , Philip Salzmann , Peter Thoman , Thomas Fahringer

A task-based data-flow methodology for programming heterogeneous systems with multiple accelerator APIs

Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrastructures. Exploiting these platforms, however, requires orchestrating several…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-02 Aleix Boné , Alejandro Aguirre , David Álvarez , Pedro J. Martinez-Ferrer , Vicenç Beltran

Extending TensorFlow's Semantics with Pipelined Execution

TensorFlow is a popular cloud computing framework that targets machine learning applications. It separates the specification of application logic (in a dataflow graph) from the execution of the logic. TensorFlow's native runtime executes…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-27 Sam Whitlock , James Larus , Edouard Bugnion