Related papers: Task Graph Transformations for Latency Tolerance

Plan-over-Graph: Towards Parallelable LLM Agent Schedule

Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning. However, challenges remain under-explored for parallel schedules. This paper introduces a novel paradigm, plan-over-graph, in which the…

Artificial Intelligence · Computer Science 2025-02-21 Shiqi Zhang , Xinbei Ma , Zouying Cao , Zhuosheng Zhang , Hai Zhao

Task-Graph Scheduling Extensions for Efficient Synchronization and Communication

Task graphs have been studied for decades as a foundation for scheduling irregular parallel applications and incorporated in programming models such as OpenMP. While many high-performance parallel libraries are based on task graphs, they…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-09 Seonmyeong Bak , Oscar Hernandez , Mark Gates , Piotr Luszczek , Vivek Sarkar

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems

Parallel programming models can encourage performance portability by moving the responsibility for work assignment and data distribution from the programmer to a runtime system. However, analyzing the resulting implicit memory allocations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-14 Fabian Knorr , Philip Salzmann , Peter Thoman , Thomas Fahringer

The Recognition of Tolerance and Bounded Tolerance Graphs

Tolerance graphs model interval relations in such a way that intervals can tolerate a certain degree of overlap without being in conflict. This subclass of perfect graphs has been extensively studied, due to both its interesting structure…

Computational Complexity · Computer Science 2010-02-03 George B. Mertzios , Ignasi Sau , Shmuel Zaks

targetDP: an Abstraction of Lattice Based Parallelism with Portable Performance

To achieve high performance on modern computers, it is vital to map algorithmic parallelism to that inherent in the hardware. From an application developer's perspective, it is also important that code can be maintained in a portable manner…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-20 Alan Gray , Kevin Stratford

LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming

The shift towards high-bandwidth networks driven by AI workloads in data centers and HPC clusters has unintentionally aggravated network latency, adversely affecting the performance of communication-intensive HPC applications. As…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-23 Siyuan Shen , Langwen Huang , Marcin Chrapek , Timo Schneider , Jai Dayal , Manisha Gajbe , Robert Wisniewski , Torsten Hoefler

Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads

Efficient parallelism is necessary for achieving low-latency, high-throughput inference with large language models (LLMs). Tensor parallelism (TP) is the state-of-the-art method for reducing LLM response latency, however GPU communications…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-27 Mert Hidayetoglu , Aurick Qiao , Michael Wyatt , Jeff Rasley , Yuxiong He , Samyam Rajbhandari

Tolerance to Asynchrony in Algorithms for Multiplication and Modulo

In this article, we study some parallel processing algorithms for multiplication and modulo operations. We demonstrate that the state transitions that are formed under these algorithms satisfy lattice-linearity, where these algorithms…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-16 Arya Tanmay Gupta , Sandeep S Kulkarni

Modeling Task Mapping for Data-intensive Applications in Heterogeneous Systems

We introduce a new model for the task mapping problem to aid in the systematic design of algorithms for heterogeneous systems including, but not limited to, CPUs, GPUs and FPGAs. A special focus is set on the communication between the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-15 Martin Wilhelm , Hanna Geppert , Anna Drewes , Thilo Pionteck

Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System

Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach. Taskflow introduces an expressive task graph programming model to assist developers in the implementation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-08 Tsung-Wei Huang , Dian-Lun Lin , Chun-Xun Lin , Yibo Lin

ParaGraph: Weighted Graph Representation for Performance Optimization of HPC Kernels

GPU-based HPC clusters are attracting more scientific application developers due to their extensive parallelism and energy efficiency. In order to achieve portability among a variety of multi/many core architectures, a popular choice for an…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-10 Ali TehraniJamsaz , Alok Mishra , Akash Dutta , Abid M. Malik , Barbara Chapman , Ali Jannesari

Taskgraph: A Low Contention OpenMP Tasking Framework

OpenMP is the de-facto standard for shared memory systems in High-Performance Computing (HPC). It includes a task-based model that offers a high-level of abstraction to effectively exploit highly dynamic structured and unstructured…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-12 Chenle Yu , Sara Royuela , Eduardo Quiñones

A Graph-based Model for GPU Caching Problems

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-04 Lingda Li , Ari B. Hayes , Stephen A. Hackler , Eddy Z. Zhang , Mario Szegedy , Shuaiwen Leon Song

Streaming Graph Algorithms in the Massively Parallel Computation Model

We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs…

Data Structures and Algorithms · Computer Science 2025-01-20 Artur Czumaj , Gopinath Mishra , Anish Mukherjee

Scalable and Adaptive Parallel Training of Graph Transformer on Large Graphs

Graph foundation models have demonstrated remarkable adaptability across diverse downstream tasks through large-scale pretraining on graphs. However, existing implementations of the backbone model, graph transformers, are typically limited…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-21 Jun-Liang Lin , Kamesh Madduri , Mahmut Taylan Kandemir

Cimple: Instruction and Memory Level Parallelism

Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for…

Programming Languages · Computer Science 2018-07-05 Vladimir Kiriansky , Haoran Xu , Martin Rinard , Saman Amarasinghe

Accelerating Task-based Iterative Applications

Task-based programming models have risen in popularity as an alternative to traditional fork-join parallelism. They are better suited to write applications with irregular parallelism that can present load imbalance. However, these…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-15 David Álvarez , Vicenç Beltran

Uniting Control and Data Parallelism: Towards Scalable Memory-Driven Dynamic Graph Processing

Control parallelism and data parallelism is mostly reasoned and optimized as separate functions. Because of this, workloads that are irregular, fine-grain and dynamic such as dynamic graph processing become very hard to scale. An…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-08 Bibrak Qamar Chandio , Thomas Sterling , Prateek Srivastava

Thread Parallelism for Highly Irregular Computation in Anisotropic Mesh Adaptation

Thread-level parallelism in irregular applications with mutable data dependencies presents challenges because the underlying data is extensively modified during execution of the algorithm and a high degree of parallelism must be realized…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-19 Georgios Rokos , Gerard J. Gorman , Kristian Ejlebjerg Jensen , Paul H. J. Kelly

LiP-LLM: Integrating Linear Programming and dependency graph with Large Language Models for multi-robot task planning

This study proposes LiP-LLM: integrating linear programming and dependency graph with large language models (LLMs) for multi-robot task planning. In order for multiple robots to perform tasks more efficiently, it is necessary to manage the…

Robotics · Computer Science 2024-10-29 Kazuma Obata , Tatsuya Aoki , Takato Horii , Tadahiro Taniguchi , Takayuki Nagai