Related papers: MAPA: Multi-Accelerator Pattern Allocation Policy …

TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design

In this paper, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of…

Hardware Architecture · Computer Science 2024-10-18 Licheng Guo , Yuze Chi , Jason Lau , Linghao Song , Xingyu Tian , Moazin Khatti , Weikang Qiao , Jie Wang , Ecenur Ustun , Zhenman Fang , Zhiru Zhang , Jason Cong

Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of Partitioned Edge Learning

To leverage data and computation capabilities of mobile devices, machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models, resulting in the new paradigm of edge learning. In this paper,…

Information Theory · Computer Science 2020-07-01 Dingzhu Wen , Mehdi Bennis , Kaibin Huang

TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs

Despite the increasing adoption of Field-Programmable Gate Arrays (FPGAs) in compute clouds, there remains a significant gap in programming tools and abstractions which can leverage network-connected, cloud-scale, multi-die FPGAs to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-05 Neha Prakriya , Yuze Chi , Suhail Basalama , Linghao Song , Jason Cong

PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters

Large-scale computing systems are increasingly using accelerators such as GPUs to enable peta- and exa-scale levels of compute to meet the needs of Machine Learning (ML) and scientific computing applications. Given the widespread and…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-20 Rutwik Jain , Brandon Tran , Keting Chen , Matthew D. Sinclair , Shivaram Venkataraman

On Optimal Server Allocation for Moldable Jobs with Concave Speed-Up

A large proportion of jobs submitted to modern computing clusters and data centers are parallelizable and capable of running on a flexible number of computing cores or servers. Although allocating more servers to such a job results in a…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-17 Samira Ghanbarian , Arpan Mukhopadhyay , Ravi R. Mazumdar , Fabrice M. Guillemin

LAPA: Log-Domain Prediction-Driven Dynamic Sparsity Accelerator for Transformer Model

Attention-based Transformers have revolutionized natural language processing (NLP) and shown strong performance in computer vision (CV) tasks. However, as the input sequence varies, the computational bottlenecks in Transformer models…

Machine Learning · Computer Science 2025-12-10 Huizheng Wang , Hongbin Wang , Shaojun Wei , Yang Hu , Shouyi Yin

MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems

Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly. As a promising solution achieving high scalability and low manufacturing cost, multi-accelerator systems widely exist in data centers,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-25 Guan Shen , Jieru Zhao , Zeke Wang , Zhe Lin , Wenchao Ding , Chentao Wu , Quan Chen , Minyi Guo

MAPP: a Scalable Multi-Agent Path Planning Algorithm with Tractability and Completeness Guarantees

Multi-agent path planning is a challenging problem with numerous real-life applications. Running a centralized search such as A* in the combined state space of all units is complete and cost-optimal, but scales poorly, as the state space…

Artificial Intelligence · Computer Science 2014-01-17 Ko-Hsin Cindy Wang , Adi Botea

All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation

Image coding for multi-task applications, catering to both human perception and machine vision, has been extensively investigated. Existing methods often rely on multiple task-specific encoder-decoder pairs, leading to high overhead of…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Xu Zhang , Peiyao Guo , Ming Lu , Zhan Ma

Distributed Planning with Asynchronous Execution with Local Navigation for Multi-agent Pickup and Delivery Problem

We propose a distributed planning method with asynchronous execution for multi-agent pickup and delivery (MAPD) problems for environments with occasional delays in agents' activities and flexible endpoints. MAPD is a crucial problem…

Multiagent Systems · Computer Science 2023-02-21 Yuki Miyashita , Tomoki Yamauchi , Toshiharu Sugawara

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and…

Hardware Architecture · Computer Science 2023-12-21 Alireza Amirshahi , Giovanni Ansaloni , David Atienza

A Novel Process Mapping Strategy in Clustered Environments

Nowadays the number of available processing cores within computing nodes which are used in recent clustered environments, are growing up with a rapid rate. Despite this trend, the number of available network interfaces in such computing…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-13 Mohsen Soryani , Morteza Analoui , Ghobad Zarrinchian

CARMA: Contention-aware Auction-based Resource Management in Architecture

As the number of resources on chip multiprocessors (CMPs) increases, the complexity of how to best allocate these resources increases drastically. Because the higher number of applications makes the interaction and impacts of various memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-20 Farshid Farhat , Diman Zad Tootaghaj

GMA: A Pareto Optimal Distributed Resource-Allocation Algorithm

To address the rising demand for strong packet delivery guarantees in networking, we study a novel way to perform graph resource allocation. We first introduce allocation graphs, in which nodes can independently set local resource limits…

Networking and Internet Architecture · Computer Science 2023-02-01 Giacomo Giuliari , Marc Wyss , Markus Legner , Adrian Perrig

Better Process Mapping and Sparse Quadratic Assignment

Communication and topology aware process mapping is a powerful approach to reduce communication time in parallel applications with known communication patterns on large, distributed memory systems. We address the problem as a quadratic…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-23 Christian Schulz , Jesper Larsson Träff , Konrad von Kirchbach

GPU-Accelerated Algorithms for Process Mapping

Process mapping asks to assign vertices of a task graph to processing elements of a supercomputer such that the computational workload is balanced while the communication cost is minimized. Motivated by the recent success of GPU-based graph…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-16 Petr Samoldekin , Christian Schulz , Henning Woydt

Intelligent colocation of HPC workloads

Many HPC applications suffer from a bottleneck in the shared caches, instruction execution units, I/O or memory bandwidth, even though the remaining resources may be underutilized. It is hard for developers and runtime systems to ensure…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-17 Felippe V. Zacarias , Vinicius Petrucci , Rajiv Nishtala , Paul Carpenter , Daniel Mossé

Dynamic Parameter Allocation in Parameter Servers

To keep up with increasing dataset sizes and model complexity, distributed training has become a necessity for large machine learning tasks. Parameter servers ease the implementation of distributed parameter management---a key concern in…

Machine Learning · Computer Science 2020-07-06 Alexander Renz-Wieland , Rainer Gemulla , Steffen Zeuch , Volker Markl

Optimized Spatial Architecture Mapping Flow for Transformer Accelerators

Recent innovations in Transformer-based large language models have significantly advanced the field of general-purpose neural language understanding and generation. With billions of trainable parameters, deployment of these large models…

Hardware Architecture · Computer Science 2024-10-11 Haocheng Xu , Faraz Tahmasebi , Ye Qiao , Hongzheng Tian , Hyoukjun Kwon , Sitao Huang

MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments

Multi-agent navigation in dynamic environments is of great industrial value when deploying a large scale fleet of robot to real-world applications. This paper proposes a decentralized partially observable multi-agent path planning with…

Robotics · Computer Science 2020-08-03 Zuxin Liu , Baiming Chen , Hongyi Zhou , Guru Koushik , Martial Hebert , Ding Zhao