Related papers: Scheduling data flow program in xkaapi: A new affi…

Design and Experimental Evaluation of Algorithms for Optimizing the Throughput of Dispersed Computing

With growing deployment of Internet of Things (IoT) and machine learning (ML) applications, which need to leverage computation on edge and cloud resources, it is important to develop algorithms and tools to place these distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-30 Xiangchen Zhao , Diyi Hu , Bhaskar Krishnamachari

Memory-aware Adaptive Scheduling of Scientific Workflows on Heterogeneous Architectures

The analysis of massive scientific data often happens in the form of workflows with interdependent tasks. When such a scientific workflow needs to be scheduled on a parallel or distributed system, one usually represents the workflow as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-31 Svetlana Kulagina , Anne Benoit , Henning Meyerhenke

A Graph-Partition-Based Scheduling Policy for Heterogeneous Architectures

In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-02-27 Hao Wu , Daniel Lohmann , Wolfgang Schröder-Preikschat

Data-aware Dynamic Execution of Irregular Workloads on Heterogeneous Systems

Current approaches to scheduling workloads on heterogeneous systems with specialized accelerators often rely on manual partitioning, offloading tasks with specific compute patterns to accelerators. This method requires extensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-12 Zhenyu Bai , Dan Wu , Pranav Dangi , Dhananjaya Wijerathne , Venkata Pavan Kumar Miriyala , Tulika Mitra

Performant, Multi-objective Scheduling of Highly Interleaved Task Graphs on Heterogeneous System on Chip Devices

Performance-, power-, and energy-aware scheduling techniques play an essential role in optimally utilizing processing elements (PEs) of heterogeneous systems. List schedulers, a class of low-complexity static schedulers, have commonly been…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-17 Joshua Mack , Samet E. Arda , Umit Y. Ogras , Ali Akoglu

Generic algorithms for scheduling applications on heterogeneous multi-core platforms

We study the problem of executing an application represented by a precedence task graph on a parallel machine composed of standard computing cores and accelerators. Contrary to most existing approaches, we distinguish the allocation and the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-20 Marcos Amaris , Giorgio Lucarelli , Clément Mommessin , Denis Trystram

Design of a GPU with Heterogeneous Cores for Graphics

Heterogeneous architectures can deliver higher performance and energy efficiency than symmetric counterparts by using multiple architectures tuned to different types of workloads. While previous works focused on CPUs, this work extends the…

Hardware Architecture · Computer Science 2026-02-02 Aurora Tomás , Juan Luis Aragón , Joan Manuel Parcerisa , Antonio González

A Hardware-based HEFT Scheduler Implementation for Dynamic Workloads on Heterogeneous SoCs

Non-uniform performance and power consumption across the processing elements (PEs) of heterogeneous SoCs increase the computation complexity of the task scheduling problem compared to homogeneous architectures. Latency of a software-based…

Hardware Architecture · Computer Science 2022-11-15 Alexander Fusco , Sahil Hassan , Joshua Mack , Ali Akoglu

A Survey of Real-time Scheduling on Accelerator-based Heterogeneous Architecture for Time Critical Applications

Accelerator-based heterogeneous architectures, such as CPU-GPU, CPU-TPU, and CPU-FPGA systems, are widely adopted to support the popular artificial intelligence (AI) algorithms that demand intensive computation. When deployed in real-time…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-20 An Zou , Yuankai Xu , Yinchen Ni , Jintao Chen , Yehan Ma , Jing Li , Christopher Gill , Xuan Zhang , Yier Jin

HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters

With the rapid evolution of GPU architectures, the heterogeneity of model training infrastructures is steadily increasing. In such environments, effectively utilizing all available heterogeneous accelerators becomes critical for distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-05 Antian Liang , Zhigang Zhao , Kai Zhang , Xuri Shi , Chuantao Li , Chunxiao Wang , Zhenying He , Yinan Jing , X. Sean Wang

STADI: Fine-Grained Step-Patch Diffusion Parallelism for Heterogeneous GPUs

The escalating adoption of diffusion models for applications such as image generation demands efficient parallel inference techniques to manage their substantial computational cost. However, existing diffusion parallelism inference schemes…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-16 Han Liang , Jiahui Zhou , Zicheng Zhou , Xiaoxi Zhang , Xu Chen

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures: A Machine Learning Based Approach

This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures. Our approach employs a…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-10 Peng Zhang , Jianbin Fang , Canqun Yang , Chun Huang , Tao Tang , Zheng Wang

Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism…

Computer Vision and Pattern Recognition · Computer Science 2026-02-26 Euisoo Jung , Byunghyun Kim , Hyunjin Kim , Seonghye Cho , Jae-Gil Lee

QoS-aware Scheduling of Periodic Real-time Task Graphs on Heterogeneous Pre-occupied MECs

In latency-sensitive applications, efficient task scheduling is crucial for maintaining Quality of Service (QoS) while meeting strict timing constraints. This paper addresses the challenge of scheduling periodic tasks structured as directed…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-17 Ashutosh Shankar , Astha Kumari

A Foray into Efficient Mapping of Algorithms to Hardware Platforms on Heterogeneous Systems

Heterogeneous computing can potentially offer significant performance and performance per watt improvements over homogeneous computing, but the question "what is the ideal mapping of algorithms to architectures?" remains an open one. In the…

Hardware Architecture · Computer Science 2016-05-24 Oren Segal , Nasibeh Nasiri , Martin Margala

Priority-Aware Near-Optimal Scheduling for Heterogeneous Multi-Core Systems with Specialized Accelerators

To deliver high performance in power limited systems, architects have turned to using heterogeneous systems, either CPU+GPU or mixed CPU-hardware systems. However, in systems with different processor types and task affinities, scheduling…

Performance · Computer Science 2017-12-12 Zhuo Chen , Diana Marculescu

Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms

The widely-adopted practice is to train deep learning models with specialized hardware accelerators, e.g., GPUs or TPUs, due to their superior performance on linear algebra operations. However, this strategy does not employ effectively the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-21 Yujing Ma , Florin Rusu

APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUs

Deploying large language models (LLMs) for online inference is often constrained by limited GPU memory, particularly due to the growing KV cache during auto-regressive decoding. Hybrid GPU-CPU execution has emerged as a promising solution…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-16 Jiakun Fan , Yanglin Zhang , Xiangchen Li , Dimitrios S. Nikolopoulos

Scheduling Fork-Join Task Graphs to Heterogeneous Processors

The scheduling of task graphs with communication delays has been extensively studied. Recently, new results for the common sub-case of fork-join shaped task graphs were published, including an EPTAS and polynomial algorithms for special…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-06 Huijun Wang , Oliver Sinnen

Scheduling on Hybrid Platforms: Improved Approximability Window

Modern platforms are using accelerators in conjunction with standard processing units in order to reduce the running time of specific operations, such as matrix operations, and improve their performance. Scheduling on such hybrid platforms…

Data Structures and Algorithms · Computer Science 2020-02-11 Vincent Fagnon , Imed Kacem , Giorgio Lucarelli , Bertrand Simon